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Abstract 



Simulated annealing is a popular method for approaching the solution of a global 
optimization problem. Existing results on its performance apply to discrete com- 
binatorial optimization where the optimization variables can assume only a finite 
set of possible values. We introduce a new general formulation of simulated an- 
nealing which allows one to guarantee finite-time performance in the optimiza- 
tion of functions of continuous variables. The results hold universally for any 
optimization problem on a bounded domain and establish a connection between 
simulated annealing and up-to-date theory of convergence of Markov chain Monte 
Carlo methods on continuous domains. This work is inspired by the concept of 
finite-time learning with known accuracy and confidence developed in statistical 
learning theory. 

Optimization is the general problem of finding a value of a vector of variables 6 that maximizes (or 
minimizes) some scalar criterion U{9). The set of all possible values of the vector 6 is called the 
optimization domain. The elements of 6 can be discrete or continuous variables. In the first case 
the optimization domain is usually finite, such as in the well-known traveling salesman problem; in 
the second case the optimization domain is a continuous set. An important example of a continuous 
optimization domain is the set of 3-D configurations of a sequence of amino-acids in the problem of 
finding the minimum energy folding of the corresponding protein 1 1 1. 

In principle, any optimization problem on a finite domain can be solved by an exhaustive search. 
However, this is often beyond computational capacity: the optimization domain of the traveling 
salesman problem with 100 cities contains more than 10^^^ possible tours. An efficient algorithm 
to solve the traveling salesman and many similar problems has not yet been found and such prob- 
lems remain reliably solvable only in principle |2 |. Statistical mechanics has inspired widely used 
methods for finding good approximate solutions in hard discrete optimization problems which defy 
efficient exact solutions ||3] |4] |5] |6| . Here a key idea has been that of simulated annealing |3|: a 
random search based on the Metropolis-Hastings algorithm, such that the distribution of the ele- 
ments of the domain visited during the search converges to an equilibrium distribution concentrated 
around the global optimizers. Convergence and finite-time performance of simulated annealing on 
finite domains has been evaluated in many works, e.g. I.7.,8..9. .10J . 

'Preprint. The final version will appear in: Advances in Neural Information Processing Systems 20, 
Proceedings of NIPS 2007, MIT Press. 
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On continuous domains, most popular optimization methods perform a local gradient-based 
search and in general converge to local optimizers; with the notable exception of convex cri- 
teria where convergence to the unique global optimizer occurs 111]. Simulated annealing per- 
forms a global search and can be easily implemented on continuous domains. Hence it can 
be considered a powerful complement to local methods. In this paper, we introduce for the 
first time rigorous guarantees on the finite-time performance of simulated annealing on con- 
tinuous domains. We will show that it is possible to derive simulated annealing algorithms 
which, with an arbitrarily high level of confidence, find an approximate solution to the prob- 
lem of optimizing a function of continuous variables, within a specified tolerance to the global 
optimal solution after a known finite number of steps. Rigorous guarantees on the finite-time 
performance of simulated annealing in the optimization of functions of continuous variables 
have never been obtained before; the only results available state that simulated annealing con- 
verges to a global optimizer as the number of steps grows to infinity, e.g. IT2l [l3l (141 [151 . 

The background of our work is twofold. On the one hand, our notion of approximate solution to a 
global optimization problem is inspired by the concept of finite-time learning with known accuracy 
and confidence developed in statistical learning theory llT6l[T7l . We actually maintain an important 
aspect of statistical learning theory which is that we do not introduce any particular assumption on 
the optimization criterion, i.e. our results hold regardless of what U is. On the other hand, we ground 
our results on the theory of convergence, with quantitative bounds on the distance to the target dis- 
tribution, of the Metropolis-Hastings algorithm and Markov Chain Monte Carlo (MCMC) methods, 
which has been one of the main achievements of recent research in statistics |f8l[T9ll20ll2Tl . 

In this paper, we will not develop any ready-to-use optimization algorithm. We will instead in- 
troduce a general formulation of the simulated annealing method which allows one to derive new 
simulated annealing algorithms with rigorous finite-time guarantees on the basis of existing theory. 
The Metropolis-Hastings algorithm and the general family of MCMC methods have many degrees 
of freedom. The choice and comparison of specific algorithms goes beyond the scope of the paper. 

The paper is organized in the following sections. In Simulated annealing we introduce the method 
and fix the notation. In Convergence we recall the reasons why finite-time guarantees for simulated 
annealing on continuous domains have not been obtained before. In Finite-time guarantees we 
present the main result of the paper. In Conclusions we state our findings and conclude the paper. 



1 Simulated annealing 

The original formulation of simulated annealing was inspired by the analogy between the stochastic 
evolution of the thermodynamic state of an annealing material towards the configurations of minimal 
energy and the search for the global minimum of an optimization criterion |3 1. In the procedure, the 
optimization criterion plays the role of the energy and the state of the annealed material is simulated 
by the evolution of the state of an inhomogeneous Markov chain. The state of the chain evolves 
according to the Metropolis-Hastings algorithm in order to simulate the Boltzmann distribution of 
thermodynamic equilibrium. The Boltzmann distribution is simulated for a decreasing sequence of 
temperatures ("cooling"). The target distribution of the cooling procedure is the limiting Boltzmann 
distribution, for the temperature that tends to zero, which takes non-zero values only on the set of 
global minimizers Q. 

The original formulation of the method was for a finite domain. However, simulated annealing can 
be generalized straightforwardly to a continuous domain because the Metropolis-Hastings algorithm 
can be used with almost no differences on discrete and continuous domains The main difference is 
that on a continuous domain the equilibrium distributions are specified by probability densities. On 
a continuous domain, Markov transition kernels in which the distribution of the elements visited by 
the chain converges to an equilibrium distribution with the desired density can be constructed using 
the Metropolis-Hastings algorithm and the general family of MCMC methods |[22l . 

We point out that Boltzmann distributions are not the only distributions which can be adopted as 
equilibrium distributions in simulated annealing [7|. In this paper it is convenient for us to adopt a 
different type of equilibrium distribution in place of Boltzmann distributions. 
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Figure 1 : The function U{d) (upper left) and some probability densities of the form hg [9) oc [U{d) + 
Sy for 6 = 0.5 and J = 3 (upper right), J — 6 (lower left) and J = 20 (lower right). 



1.1 Our setting 

The optimization criterion is t/ : — > [0, 1], with C R^. The assumption that U takes 
values in the interval [0, 1] is a technical one. It does not imply any serious loss of generality. In 
general, any bounded optimization criterion can be scaled to take values in [0, 1]. We assume that 
the optimization task is to find a global maximizer; this can be done without loss of generality. We 
also assume that © is a bounded set. 

We consider equilibrium distributions defined by probability density functions proportional to 
[U (0) + 5Y where J and 5 are two strictly positive parameters. We use tt*^'^) to denote an equilibrium 
distribution, i.e. ■n'^'^^dO) oc [UiO) + SYnLebidO) where ULeb is the standard Lebesgue measure. 
Here, plays the role of the temperature: if the function U{9) (Figure[T]a) plus 5 is taken to a 
positive power J then as J increases (i.e. as J^^ decreases) [U{9) + 6Y (Figure [T]b-d) becomes 
increasingly peaked around the global maximizers. The parameter 5 is an offset which guarantees 
that the equilibrium densities are always strictly positive, even if U takes zero values on some ele- 
ments of the domain. The offset 5 is chosen by the user and we show later that our results allow one 
to make an optimal selection of 5. The zero-temperature distribution is the limiting distribution, for 
J ^ cxD, which takes non-zero values only on the set of global maximizers. It is denoted by tt'^""). 

In the generic formulation of the method, the Markov transition kernel of the fc-th step of the inho- 
mogeneous chain has equilibrium distribution tt''^'''^ where { Jfe}fc=i,2.... is the "cooling schedule". 
The cooling schedule is a non-decreasing sequence of positive numbers according to which the equi- 
librium distribution become increasingly sharpened during the evolution of the chain. We use 6^ 
to denote the state of the chain and Pq^ to denote its probability distribution. The distribution Pg^ 
obviously depends on the initial condition 6q. However, in this work, we don't need to make this 
dependence expUcit in the notation. 
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Remark 1: If, given an element 9 in 0, the value U (9) can be computed directly, we say that U is 
a deterministic criterion, e.g. the energy landscape in protein structure prediction fl]. In problems 
involving random variables, the value U (9) may be the expected value U{6) — J g{x, 9)px{x; 9)dx 
of some function g which depends on both the optimization variable 9, and on some random variable 
X which has probability density Px{x\ 9) (which may itself depend on 9). In such problems it is 
usually not possible to compute U{9) directly, either because evaluation of the integral requires 
too much computation, or because no analytical expression for px [x; 9) is available. Typically one 
must perform stochastic simulations in order to obtain samples of x for a given 9, hence obtain 
sample values of g{x, 9), and thus construct a Monte Carlo estimate of U (9). The Bayesian design 
of clinical trials is an important application area where such expected-value criteria arise |23|. The 
authors of this paper investigate the optimization of expected-value criteria motivated by problems of 
aircraft routing |24|. In the particular case thatpx{x; 9) does not depend on 9, the optimization task 
is often called "empirical risk minimization", and is studied extensively in statistical learning theory 
|fT6l[r7ll . The results of this paper apply in the same way to the optimization of both deterministic and 
expected-value criteria. The MCMC method developed by Miiller ||25l |26 1 allows one to construct 
simulated annealing algorithms for the optimization of expected-value criteria. Miiller ll25l l26l 
employs the same equilibrium distributions as those described in our setting; in his context J is 
restricted to integer values. 

In Figure|2l we illustrate the basic iteration of a generic simulated annealing algorithm with equilib- 
rium distributions T:'^'^^{d9) for the optimization of deterministic and expected-value criteria. 

2 Convergence 

The rationale of simulated annealing is as follows: if the temperature is kept constant, say Jk = 
J, then the distribution of the state of the chain Pg,. tends to the equilibrium distribution n'^'-^^; if 
J ^ oo then the equilibrium distribution tt'^'') tends to the zero-temperature distribution 7r*^°°); 
as a result, if the cooling schedule Jk tends to infinity, one obtains that Pg^, "follows" tt'^''''^ and 
that tt'^'^'-^ tends to 7r^°°^ and eventually that the distribution of the state of the chain Pg^, tends to 
^(oo) rpjjg theory shows that, under conditions on the cooling schedule and the Markov transition 
kernels, the distribution of the state of the chain Pg^ actually converges to the target zero-temperature 
distribution t:'--°°'> as /c — > oo lfT2l [T3l [T4l [TSll . Convergence to the zero-temperature distribution 
implies that asymptotically the state of the chain eventually coincides with a global optimizer with 
probability one. 

The difficulty which must be overcome in order to obtain finite step results on simulated annealing 
algorithms on a continuous domain is that usually, in an optimization problem defined over continu- 
ous variables, the set of global optimizers has zero Lebesgue measure (e.g. a set of isolated points). 
If the set of global optimizers has zero measure then the set of global optimizers has null probability 
according to the equilibrium distributions tt*^"^) for any finite J and, as a consequence, according to 
the distributions Pe^ for any finite k. Put another way, the probability that the state of the chain visits 
the set of global optimizers is constantly zero after any finite number of steps. Hence the confidence 
of the fact that the solution provided by the algorithm in finite time coincides with a global optimizer 
is also constantly zero. Notice that this is not the case for a finite domain, where the set of global 
optimizers is of non-null measure with respect to the reference counting measure iTllSllQl fTOl . 

It is instructive to look at the issue also in terms of the rate of convergence to the target zero- 
temperature distribution. On a discrete domain, the distribution of the state of the chain at each 
step and the zero-temperature distribution are both standard discrete distributions. It is then possible 
to define a distance between them and study the rate of convergence of this distance to zero. This 
analysis allows one to obtain results on the finite-time behavior of simulated annealing |7, 8|. On a 
continuous domain and for a set of global optimizers of measure zero, the target zero-temperature 
distribution 7r*^°°) ends up being a mixture of probability masses on the set of global optimizers. In 
this situation, although the distribution of the state of the chain Pg^ still converges asymptotically 
to 'K^°°\ it is not possible to introduce a sensible distance between the two distributions and a rate 
of convergence to the target distribution cannot even be defined (weak convergence), see (TT.. The- 
orem 3.3]. This is the reason that until now there have been no guarantees on the performance of 
simulated annealing on a continuous domain after a finite number of computations: by adopting the 
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Algorithm I: simulated annealing for a deterministic criterion 

Assume that the current state of the chain is 6^- 

1 Propose a new state 6k+i ~ qg{0\9k)- 

2 Calculate the acceptance probability 

3 With probability p, accept the proposed state and set Oi^^i = 6^.^i . Otherwise 
leave the current state unchanged, i.e. set = 6^. 



Algorithm II: simulated annealing for an expected-value criterion 

Assume that the current state of the chain is 6k and that {ic^f ' \j = 1, . . . , Jfc}: 
are Jk independent extractions of Xk ^ Px{x] 6k)- 



1 Propose a new state 6k+i ^ qg{(^\6k) and generate Jk independent extractions 

= 1, • • • , ^fc} of Xk+i ^ Pa,{x\ 6k+i). 

2 Calculate the acceptance probability 

^1 



p = mm < 



qgi6k\6k+l) ,7=1 



qei^k+i\6k 



1 



\\\3(^,6k) + S\ 

j=i 



3 With probability p, accept the proposed state and set 6k+i = 6k+i and 
{a^fcli = ^fc+ili = 1) • • ■ ) Jk}- Otherwise leave the current state unchanged, 

^fe 'li = 1; ■ • ■ ! <A}- If Jk+i > Jk then generate 

Cfclll,? = Jk + l,...,Jk+l} of Xk+l ^ 

p^{x;dk+i). 



i.e. set 6k+i = dk and {x['l^ 
new independent extractions 



Figure 2: The basic iterations of simulated annealing with equilibrium distributions 'K^'^^{d9), for 
the maximization of deterministic and expected-value criteria (see Remark 1). In the algorithms, q^ 
is the density of the "proposal distribution" of the Metropolis step. The iteration for the expected 
value criterion has been proposed by Miiller ||25ll26| . 

zero-temperature distribution 7r(°°) as the target distribution it is only possible to prove asymptotic 
convergence in infinite time to a global optimizer. 

Remark 2: The standard distance between two distributions, say /ii and /i2, on a continuous support 
is the total variation norm — /i2||Ty = sup^ ^ M2(^)|, see e.g. |21|. In simulated an- 

nealing on a continuous domain the distribution of the state of the chain Pe^ is absolutely continuous 
with respect to the Lebesgue measure (i.e. iTLebiA) = => P^j. [A) = 0), by construction for any 
finite k. Hence if the set of global optimizers has zero Lebesgue measure then it has zero measure 
also according to Pg^ . The set of global optimizers has however measure 1 according to 7r(°°'. The 
distance WPe^ — tt*-""-' \\tv is then constantly 1 for any finite k. 

It is also worth mentioning that if the set of global optimizers has zero measure then asymptotic 
convergence to the zero-temperature distribution 7r(°°-' can be proven only under the additional as- 
sumptions of continuity and differentiability of U lfT2l[T3lfT4l[T5l . 
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3 Finite-time guarantees 



In general, optimization algorithms for problems defined on continuous variables can only find ap- 
proximate solutions in finite time |27|. Given an element 6* of a continuous domain how can we 
assess how good it is as an approximate solution to an optimization problem? Here we introduce 
the concept of approximate global optimizer to answer this question. The definition is given for 
a maximization problem in a continuous but bounded domain. We use two parameters: the value 
imprecision e (greater than or equal to 0) and the residual domain a (between and 1) which to- 
gether determine the level of approximation. We say that is an approximate global optimizer of U 
with value imprecision e and residual domain a if the function U takes values strictly greater than 
U{9) + e only on a subset of values of 9 no larger than an a portion of the optimization domain. The 
formal definition is as follows. 

Definition 1 Let U : & ^ M. be an optimization criterion where C is bounded. Let i^Leb 
denote the standard Lebesgue measure. Let e > and a G [0, 1] be given numbers. Then 6 is an 
approximate global optimizer of U with value imprecision e and residual domain a if 'KLeb{6' G : 
U{e')>U{e)+e} <aTTLeb{®). 

In other words, the value U (9) is within e of a value which is greater than the values that U takes 
on at least a 1 — a portion of the domain. The smaller e and a are, the better is the approximation 
of a true global optimizer If both a and e are equal to zero then U (9) coincides with the essential 
supremum of U. 

Our definition of approximate global optimizer carries an important property, which holds regardless 
of what the criterion U is: if e and a have non-zero values then the set of approximate global 
optimizers always has non-zero Lebesgue measure. It follows that the probability that the chain 
visits the set of approximate global optimizers can be non-zero. Hence, it is sensible to study the 
confidence of the fact that the solution found by simulated anneaUng in finite time is an approximate 
global optimizer. 

Remark 3: The intuition that our notion of approximate global optimizer can be used to obtain formal 
guarantees on the finite-time performance of optimization methods based on a stochastic search of 
the domain is akeady apparent in the work of Vidyasagar [17. ,281. Vidyasagar 1 17 28 1 introduces a 
similar definition and obtains rigorous finite-time guarantees in the optimization of expected value 
criteria based on uniform independent sampling of the domain. Notably, the number of independent 
samples required to guarantee some desired accuracy and confidence turns out to be polynomial 
in the values of the desired imprecision, residual domain and confidence. Although the method of 
Vidyasagar is not highly sophisticated, it has had considerable success in solving difficult control 
system design applications |28 29 1. Its appeal stems from its rigorous finite-time guarantees which 
exist without the need for any particular assumption on the optimization criterion. 

Here we show that finite-time guarantees for simulated annealing can be obtained by selecting a 
distribution tt^'^^ with a finite J as the target distribution in place of the zero-temperature distribution 
^(oo) rpjjg fundamental result is the following theorem which allows one to select in a rigorous way 
S and J in the target distribution n^'^K It is important to stress that the result holds universally for 
any optimization criterion [/ on a bounded domain. The only minor requirement is that U takes 
values in [0, 1]. 

Tlieorem 1 Let U : & [0, 1] be an optimization criterion where C is bounded. Let 
J > 1 and S > be given numbers. Let 6 be a multivariate random variable with distribution 
TT^"^-* (d^?) cx [U{9) + SyTTLebid9). Let a e (0, 1] and e £ [0, 1] be given numbers and define 
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1 



1 



1 1 



1 



(1) 



Then the statement "6 is an approximate global optimizer of U with value imprecision e and residual 
domain a " holds with probability at least a. 

Proof. See Appendix A. 
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The importance of the choice of a target distribution tt^'^^ with a finite J is that ^(•') is absolutely 
continuous with respect to the Lebesgue measure. Hence, the distance WPe^ — tt^'^' ||tv between the 
distribution of the state of the chain Pe^, and the target distribution tt^ '^ is a meaningful quantity. 

Convergence of the Metropolis-Hastings algorithm and MCMC methods in total variation norm is a 
well studied problem. The theory provides simple conditions under which one derives upper bounds 
on the distance to the target distribution which are known at each step of the chain and decrease 
monotonically to zero as the number of steps of the chain grows. The theory has been developed 
mainly for homogeneous chains ifTSl [T9l l20l |2T1 . 

In the case of simulated annealing, the factor that enables us to employ these results is the absolute 
continuity of the target distribution tt'^'^' with respect to the Lebesgue measure. However, simulated 
annealing involves the simulation of inhomogeneous chains. In this respect, another important fact 
is that the choice of a target distribution tt^-^^ with a finite J implies that the inhomogeneous Markov 
chain can in fact be formed by a finite sequence of homogeneous chains (i.e. the cooling schedule 
{ Jfc}fc=i.2,... can be chosen to be a sequence that takes only a finite set of values). In turn, this allows 
one to apply the theory of homog eneous MCMC methods to study the convergence of Po^ to tt^ ' 
in total variation norm. 

On a bounded domain, simple conditions on the 'proposal distribution' in the iteration of the simu- 
lated annealing algorithm allows one to obtain upper bounds on — vr'''^ ||tv that decrease geo- 
metrically to zero as fc ^ oo, without the need for any additional assumption on U IfTSl [19] |20l 1211 . 

It is then appropriate to introduce the following finite-time result. 

Theorem 2 Let the notation and assumptions ofTheorem\l\hold. Let 9k, with distribution Pe^., be 
the state of the inhomogeneous chain of a simulated annealing algorithm with target distribution 
iT^'^\ Then the statement "Ok is an approximate global optimizer ofU with value imprecision e and 
residual domain a" holds with probability at least a — UPe^ — n^'^^ Hrv 

The proof of the theorem follows directly from the definition of the total variation norm. 

It follows that if simulated annealing is implemented with an algorithm which converges in total 
variation distance to a target distribution vr''^' with a finite J, then one can state with confidence 
arbitrarily close to 1 that the solution found by the algorithm after the known appropriate finite 
number of steps is an approximate global optimizer with the desired approximation level. For given 
non-zero values of e, a the value of a given by ([T]l can be made arbitrarily close to 1 by choice of 
J; while the distance \\Pef. — tt^"^^ ||tv can be made arbitrarily small by taking the known sufficient 
number of steps. 

It can be shown that there exists the possibility of making an optimal choice of (5 and J in the target 
distribution tx^'^'' . In fact, for given e and a and a given value of J there exists an optimal choice 
of b which maximizes the value of a given by (HJ. Hence, it is possible to obtain a desired a with 
the smallest possible J. The advantage of choosing the smallest J, consistent with the required 
approximation and confidence, is that it will decrease the number of steps required to achieve the 
desired reduction of \\Peu ^ ^''^^ IItv- 

4 Conclusions 

We have introduced a new formulation of simulated annealing which admits rigorous finite-time 
guarantees in the optimization of functions of continuous variables. First, we have introduced the 
notion of approximate global optimizer Then, we have shown that simulated annealing is guaranteed 
to find approximate global optimizers, with the desired confidence and the desired level of accuracy, 
in a known finite number of steps, if a proper choice of the target distribution is made and conditions 
for convergence in total variation norm are met. The results hold for any optimization criterion on a 
bounded domain with the only minor requirement that it takes values between and 1 . 

In this framework, simulated annealing algorithms with rigorous finite-time guarantees can be de- 
rived by studying the choice of the proposal distribution and of the cooling schedule, in the generic 
iteration of simulated annealing, in order to ensure convergence to the target distribution in total 
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variation norm. To do this, existing theory of convergence of the Metropolis-Hastings algorithm and 
MCMC methods on continuous domains can be used ifTSl [T9l l20l |2T1 . 

Vidyasagar ifTTl 1281 has introduced a similar definition of approximate global optimizer and has 
shown that approximate optimizers with desired accuracy and confidence can be obtained with a 
number of uniform independent samples of the domain which is polynomial in the accuracy and 
confidence parameters. In general, algorithms developed with the MCMC methodology can be 
expected to be equally or more efficient than uniform independent sampling. 



Acknowledgments 

Work supported by EPSRC, Grant EP/C014006/1, and by the European Commission under projects 
HYGEIA FP6-NEST-4995 and iFly FP6-TREN-037180. We thank S. Brooks, M. Vidyasagar and 
D. M. Wolpert for discussions and useful comments on the paper 



A Proof of Theorem [T] 

Let a G (0, 1] and p £ (0, 1] be given numbers. Let Us{0) := U{9) + 5. Let be a normalized 
measure such that TTs{d9) cx Us{0)TrLeb{d9). In the first part of the proof we find a lower bound on 
the probability that 9 belongs to the set 

{6* e : Trs{9' G : pUs{0') > UsiO)} < a}. 

Let Ua '■= inf{y : tts{0 6 : Us{d) < 2/} > 1 — a}. To start with we show that the set 
{9 e& : ns{9' e : pUsi0') > UsiO)} < a} coincides with {0 e @ : Us{0) > pya). Notice 
that the quantity TT^I^ :Us{9) < y} is a right-continuous non-decreasing function of y because 
it has the form of a distribution function (see e.g. |30, p. 162] and JT?], Lemma ILl]). Therefore we 
have t:s{0 G : U5{0) < ya} > 1 - <5 and 

V>pya ^ T^s{0' ^®:pUs{0')<y}>l-a tis{0' e @ : pUs{0') > y} < a . 

Moreover, 

y<pya ^ TTs{0' pUs{0')<y} <l-a ^ t:s{0' e @ : pUs{0') > y} > a 
and taking the contrapositive one obtains 

TTs{0' e@:pUs{0')>y}<a ^ y>py&- 

Therefore {0 e : Us{0) > py&} = {0 e @ : ns{0' G : pUsi0') > Us{0)} < a}. 
We now derive a lower bound on 7r'^'-''{6' G : Us{0) > pya}- Let us introduce the notation 
As,:={0e&: Us{0) < y^}, := {0 e & : Us{0) > y^}, B^^p ■.= {0 G : Us{0) < pya} 
and Ba,p := {6* G : Us{0) > pya}- Notice that i?a,p Q and Aa C Ba,p- The quantity 
TTs{0 G : Us{0) < J/} as a function of y is the left-continuous version of Tr^j^ £ @ : Us{0) < 
t/} ll30l p. 162]. Hence, the definition of j/a implies Trs{Aa) < 1 — a and Trs{Aa) > a. Notice that 

ns{Aa)<l-a 7-T7T<l-a, 

[j&Us{0)TTLeb{d0)\ 

(l + (5Keb(Aa) _ 
[j&Us{0)TrLeb{d0)\ 



Hence, 7rLeb(^a) > and 



1^Leb{Aa) ^ 1 - Ct 1 



T^Leb{Aa) 
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Notice that 7rLe6(^a) > implies 7rLe6(-Ba,p) > 0. We obtain 

n^J^e G : Us{e) > py^} = ^ 



> -„ 7 > 



/Sa,. UsieynLebide) 
^ ^ isa^P UsieyTTLebide) 
1 



lB.„Us{0y7rLeb{de) - ^^P^yiTTLeb{Bc.,p) 



h^UsiOyTTLebide) yi T^Leb{A<,) 

7rz,e6(Aa) a 

Since {61 G : Vi{ff) > p?/a} = {61 G : 7r5{(9' G : pU&{B') > Us{e)} < a} the first part of 
the proof is complete. 

In the second part of the proof we show that the set {6* G : 7r5{6'' G : p Us{9') > Us{9)} < a} 
is contained in the set of approximate global optimizers of U with value imprecision e := {p~^ — 
1)(1 + (5) and residual domain Hence, we show that {6* G © : Trs{6' £ : pUs{6') > 

Usi9)} <a}C{e£@: TVLeb{0' e : U{e') > U{0) + i} < anLebi®)}- We have 

u{e') > u{e) + ? ^ pUsie') > p[Us{e) + i] pUs{e') > Us{e) 

which is proven by noticing that p[U5{0) + e] > Us{9) ^ 1 - p > U{9){1 - p) 
and U{9) G [0, 1]. Hence {61' G : pUsi9') > Us{9)} D {9' G & : U{e') > u{9) + e] . 
Therefore tts{9' G : pUs{9') > U5i9)} <a^ 7Ts{e' G : U{9') > C/(6») + e} < a . Let 
Qe,g := {6*' G : U{9') > U{9) + i} and notice that 



Trs{9' G : U{9') > U{9)+e} 



U{9')7TLeb{d9') + S7TLeb{Qe,e) 

Qe.i 

Ui9'yLebid9') + STrLeb{@) 







We obtain 

Tr5{9' G : U{9') > U{9) + €}<a^ ^^LebiQe^i) + 5TTLeb{Qe,e) < a(l + '5)7rLeh(0) 

TTLeb{0' G : U{9') > U{e) + e} < aiTLebi®) ■ 

Hence we can conclude that 

Trs{9' e : pUsiO') > Usi9)} <d ^ TTLeb{9' G : U{e') > U{e)+e} < anLebi®) 
and the second part of the proof is complete. 

We have shown that given a G (0, 1], p G (0, 1], e := - 1)(1 + 6), a := i±i a and 



a :- 



jl-al + S 

P — T- 1 + 

a ^ 



1 + S 



e + l + ^ 



11 + ^ 
a i+ 6 



1+5 



the statement "6 is an approximate global optimizer of U with value imprecision e and residual 
domain a" holds with probability at least a. Notice that e G [0, 1] and a G (0, 1] are linked through 
a bijective relation to p G [5^, 1] and a G (0, j^]- The statement of the theorem is eventually 
obtained by expressing cr as a function of desired e = e and a = a. □ 
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